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Abstract 

In the present paper we consider the problem of Laplace deconvolution with noisy discrete 
observations. The study is motivated by Dynamic Contrast Enhanced imaging using a bolus 
of contrast agent, a procedure which allows considerable improvement in evaluating the quality 
of a vascular network and its permeability and is widely used in medical assessment of brain 
flows or cancerous tumors. Although the study is motivated by medical imaging application, 
we obtain a solution of a general problem of Laplace deconvolution based on noisy data which 
appears in many different contexts. We propose a new method for Laplace deconvolution which 
is based on expansions of the convolution kernel, the unknown function and the observed signal 
over Laguerre functions basis. The expansion results in a small system of linear equations 
with the matrix of the system being triangular and Toeplitz. The number m of the terms 
in the expansion of the estimator is controlled via complexity penalty. The advantage of this 
methodology is that it leads to very fast computations, does not require exact knowledge of 
the kernel and produces no boundary effects due to extension at zero and cut-off at T. The 
technique leads to an estimator with the risk within a logarithmic factor of m of the oracle risk 
under no assumptions on the model and within a constant factor of the oracle risk under mild 
assumptions. The methodology is illustrated by a finite sample simulation study which includes 
an example of the kernel obtained in the real life DCE experiments. Simulations confirm that 
the proposed technique is fast, efficient, accurate, usable from a practical point of view and 
competitive. 
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Figure 1: DCE-CT experiment and contrast agent circulation. The patient body is 
materialized by the mixed arrow. 
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1 Introduction 

Cancers and vascular diseases present major public health concerns. Considerable improvement 
in assessing the quality of a vascular network and its permeability have been achieved through 
Dynamic Contrast Enhanced (DCE) imaging using a bolus of contrast agent at high frequency such 
as Dynamic Contrast Enhanced Computer Tomography (DCE-CT), Dynamic Contrast Enhanced 
Magnetic Resonance Imaging (DCE-MRI) and Dynamic Contrast Enhanced Ultra Sound (DCE- 
US). Such techniques are widely used in medical assessment of brain flows or cancerous tumors (see, 
e.g., Cao et al, 2010; Goh et at, 2005; Goh and Padhani, 2007; Cuenod et al, 2006; Cuenod et 
al., 2011; Miles, 2003; Padhani and Harvey, 2005 and Bisdas et al, 2007). This imaging procedure 
has great potential for cancer detection and characterization, as well as for monitoring in vivo the 
effects of treatments. It is also used, for example, after a stroke for prognostic purposes or for 
occular blood flow evaluation. 

As an example, below we consider a DCE-CT experiment that follows the diffusion of a bolus of 
a contrast agent injected into a vein. At the microscopic level, for a given voxel of interest having 
unit volume, the number of arriving particles at time t is given by /3AIF(i), where the Arterial 
Input Function (AIF) measures concentration within a unit volume voxel inside the aorta and f3 is 
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a proportion of the AIF which enters the tissue voxel. Denote the number of particles in the voxel 
at time t by Y(t) and the random lapse of time during which a particle sojourns in the voxel by S. 
Assuming sojourn times for different particles to be i.i.d. with c.d.f. F, one obtains the following 
equation for the average number of contrast agent particles at the moment t 

EY(t)=[ P AlF(t - t) dr- f /3 AIF(t - r) P(S < r) dr = [ (3 AIF(t - r)(l - F(r))dr, 
Jo Jo Jo 

V „ ' S „ ' 

arrived before time t left before time t 

where the expectation is taken under the unknown distribution of the sojourn times. In reality, 
one does not know EY(t) and has discrete noisy observations 

Y(t i )=EY{t i )+ae i . 

Medical doctors are interested in a reproducible quantification of the blood flow inside the tissue 
which is characterized by f(t) = (3(1 — F(t)) since this quantity is independent of the concentration 
of particles of contrast agent within a unit volume voxel inside the aorta described by AIF(t). 
The sequential imaging acquisition is illustrated by Figure [TJ The contrast agent arrives with the 
oxygenated blood through the aorta (red arrow) where its concentration, AIF, within unit volume 
voxel is first measured when it passes through the CT cross section (red box). Subsequently, the 
contrast agent enters the arterial system, and it is assumed that its concentration does not change 
during this phase. The exchange within the tissue of both oxygen and contrast agent occurs after 
the arterial phase and the concentration of contrast agent during this exchange is measured in all 
tissue voxels (grey voxel in the zoom) inside the CT cross section. Later the contrast agent returns 
to the venous system with the de-oxygenated blood (blue arrow). 

To complete description of this experiment, one has to take into account that there is a delay 
5 between the measurement of the contrast agent concentration inside the aorta (first cross of the 
CT section) and its arrival inside the tissue. This leads to the following complete model: 

Y(U)= f l pkW(ti-T)(l-F{r))dT + ae h i = l,...,n. (1.1) 
Jo 

The value of delay 5 can be measured with a small error using the decay between the jumps after 
the injection of the contrast agent inside the aorta and the tissue. Unfortunately, evaluation of 
the proportion f3 is a much harder task which is realized with a larger error. In the spirit of 



complete model (1.1) for DCE-CT experiments, one can consider a more general model of Laplace 



convolution equation based on noisy observations which presents a necessary theoretical step before 



obtaining medical answers provided by model (1.1). 



Indeed, for a known value of 6, equation ( 1.1 ) reduces to a noisy version of a Laplace convolution 
equation 



y(U) = 9(U-r)f(T)dT + aei, i = l,...,n, (1.2) 
Jo 
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where function g is considered to be known, / is a function of interest, measurements y{U) are 
taken at points < t% < ... < t n < T < oo and ei are i.i.d. iV(0,l). The corresponding noiseless 
version of this equation can be written as 

q(t)= I g(t-r)f(T)dr, t > 0. (1.3) 



Formally, by setting g(t) = f(t) = for t < 0, equation (1.3) can be viewed as a particular case 
of the Fredholm convolution equation 

q(t)= f g(t-r)f(r)dr, (1.4) 

J a 

where a = — oo and b = oo for Fourier convolution on a real line and — oo < a < b < oo for circular 



convolution. Discrete stochastic version of equation (1.4) 

y( t i)= 9(U - i~)f(r)dT + aa, i = l,...,n, (1.5) 

J a 

known also as Fourier deconvolution problem, has been extensively studied in the last thirty years 
(see, for example, Carroll and Hall, 1988; Comte, Rozenholc and Taupin, 2007; Delaigle, Hall and 
Meister, 2008; Diggle and Hall, 1993; Fan, 1991; Fan and Koo, 2002; Johnstone et al, 2004; Pensky 
and Vidakovic, 1999; Stefanski and Carroll, 1990, among others). However, such an approach to 



solving (1.3) and (1.2) is very misleading. 

Indeed, since one does not have data outside the interval [0,T] and since function f(t) may 
not vanish fast enough as t — > oo, one cannot apply Fourier transform on the whole real line since 
Fourier transform is defined for only integrable or square integrable functions. Application of the 
discrete Fourier transform (DFT) on the finite interval [0, T] is useless since the kernel g is not 



periodic. Consequently, convolution in equation (1.3) is not circular and, hence, it is not converted 
into a product by DFT. 

The issue of having measurements only on the part t < T of half line (0, oo) does not affect 
the Laplace deconvolution since it exhibits causality property: the values of q(t) for < t < T 
depend on values of f{t) for < t < T only and vice versa. 

The mathematical theory of (noiseless) convolution type Volterra equations is well developed 



(see, e.g., Gripenberg et al. 1990) and the exact solution of (1.3) can be obtained through Laplace 
transform. However, direct application of Laplace transform for discrete measurements faces serious 
conceptual and numerical problems. The inverse Laplace transform is usually found by application 
of tables of inverse Laplace transforms, partial fraction decomposition or series expansion (see, e.g., 
Polyanin and Manzhirov, 1998), neither of which is applicable in the case of the discrete noisy 
version of Laplace deconvolution. Only few applied mathematicians and researchers in natural 



sciences took an effort to solve the problem using discrete measurements in the left hand side of ( 1.4 ). 



Since the problem arises in medical imaging, few scientists put an effort to solve equation ( 1.1 ) using 
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singular value decomposition (SVD) with the subsequent application of Tikhonov regularization 
(see, e.g., Axel (1980), Ostergaard et al. (1996) and an extensive review in Fieselmann et al. 
(2011)). In fact, SVD has been widely used in the context of DCE imaging since mid-nineties. The 
technique, however, is very computationally unstable, especially, in the presence of recirculation of 
contrast agent. For this reason, SVD has been mostly used in the simplified framework of brain 
imaging due to the presence of white barrier which prevents circulation of contrast agent outside 
blood vessels. Ameloot and Hendrickx (1983) applied Laplace deconvolution for the analysis of 
fluorescence curves and used a parametric presentation of the solution / as a sum of exponential 
functions with parameters evaluated by minimizing discrepancy with the right-hand side. In a 
somewhat similar manner, Maleknejad et al. (2007) proposed to expand the unknown solution over 
a wavelet basis and find the coefficients via the least squares algorithm. Lien et al. (2008), following 
Weeks (1966), studied numerical inversion of the Laplace transform using Laguerre functions. 



Finally, Lamm (1996) and Cinzori and Lamm (2000) used discretization of the equation (1.3) 
and applied various versions of the Tikhonov regularization technique. However, in all of the above 
papers, the noise in the measurements was either ignored or treated as deterministic. The presence 



of random noise in (1.2) makes the problem even more challenging. 



For the reasons listed above, estimation of / from discrete noisy observations y in (1.2) requires 
extensive investigation. Unlike Fourier deconvolution that has been intensively studied in statistical 
literature (see references above), Laplace deconvolution received very little attention within 
statistical framework. To the best of our knowledge, the only paper which tackles the problem 
is Dey, Martin and Ruymgaart (1998) which considers a noisy version of Laplace deconvolution 
with a very specific kernel of the form g(t) = be~ at . The authors use the fact that, in this case, 



the solution of the equation (1.3) satisfies a particular linear differential equation and, hence, can 
be recovered using q(t) and its derivative q'(t). For this particular kind of kernel, the authors 
derived convergence rates for the quadratic risk of the proposed estimators, as n increases, under 
the assumption that the s-th derivative of / is continuous on (0, oo). However, in Dey, Martin and 
Ruymgaart (1998) it is assumed that data are available on the whole positive half-line (i.e. T = oo) 
and that s is known (i.e., the estimator is not adaptive). 

Recently, Abramovich et al. (2012) studied the problem of Laplace deconvolution based on 
discrete noisy data. The idea of the method is to reduce the problem to estimation of the unknown 
regression function, its derivatives and, possibly, some linear functionals of these derivatives. The 
estimation is carried out using kernel method with the Lepskii technique for the choice of the 
bandwidth (although it is mentioned in the paper that other methodologies for the choice of 
bandwidth can also be applied). The method has an advantage of reducing a new statistical problem 
to a well studied one. However, the shortcoming of the technique is that it requires meticulous 
boundary correction and is strongly dependent on the knowledge of the kernel g. Indeed, small 
change in the kernel may produce significant changes in the expression for the estimator. 
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In the present paper we suggest a method which is designed to overcome limitations of the 
previously developed techniques. The new methodology is based on expansions of the kernel, 



unknown function / and the right-hand side in equation (1.2) over the Laguerre functions basis. 
The expansion results in a small system of linear equations with the matrix of the system being 
triangular and Toeplitz. The number of the terms in the expansion of the estimator is controlled via 
complexity penalty. The advantage of this methodology is that it leads to very fast computations 
and produces no boundary effects due to extension at zero and cut-off at T. The technique does 
not require exact knowledge of the kernel since it is represented by its Laguerre coefficients only 
and leads to an estimator with the risk within a logarithmic factor of m of the oracle risk under no 
assumptions on the model and within a constant factor of the oracle risk under mild assumptions. 
Another merit of the new methodology includes the fact that, since the unknown functions are 
represented by a small number of Laguerre coefficients, it is easy to cluster or classify them for 
various groups of patients. Simulation study shows that the method is very accurate and stable 
and easily outperforms SVD and kernel-based technique of Abramovich, Pensky, and Rozenholc 
(2012). 

The rest of the paper is organized as follows. In Section [2] we derive the system of equations 
resulting from expansion of the functions over the Laguerre basis, study the effect of discrete, 
possible irregularly spaced data and introduce selection of model size via penalization. Corollary [T] 
indeed confirms that the risk of the penalized estimator lies within a logarithmic factor of m of 
the minimal risk. In Section [3] we obtain asymptotic upper bounds for the risk and prove the risk 
lies within a constant factor of an oracle risk. The proof of this fact rests on nontrivial facts of 
the theory of Toeplitz matrices. Section [4] provides a finite sample simulation studies. Finally, 
Section [5] discusses results obtained in the paper. Section [6] contains proofs of the results in the 
earlier sections. 

2 Laplace deconvolution via expansion over Laguerre functions 
basis 

2.1 Relations between coefficients of the Laguerre expansion 



One of the possible solution of the problem (1.2) is to use Galerkin method with the basis 



represented by a system of Laguerre functions. Laguerre functions are defined as 

Mt) = V2^e- at L k (2at), A: = 0,1,..., (2.1) 
where L^{t) are Laguerre polynomials (see, e.g., Gradshtein and Ryzhik (1980)) 



k 

j=0 ' 
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It is known that functions </>&(•)> k = 0, 1, . . ., form an orthonormal basis of the L 2 (0, oo) space and, 
therefore, functions /(•), g(-), q(-) and y(-) can be expanded over this basis with coefficients f( k \ 
g( k ) ) g( fc ) anc j y( fc ) 5 = 0, . . . , oo, respectively. By plugging these expansions into formula (1.3), we 
obtain the following equation 

ft 



(2.2) 



k=0 k=0 j=0 J ° 

It turns out that coefficients of interest f^ k \ k = 0, 1, . . . , can be represented as a solution of 
an infinite triangular system of linear equations. Indeed, it is easy to check that (see, e.g., 7.411.4 
in Gradshtein and Ryzhik (1980)) 

(f> h (x)<t>j(t - x)dx = 2ae~ at [ L k (2at)Lj(2a{t - x))dx = (2a)~~ 1/2 [(f>k+j{t) - 4> k +j+i(t)}. 



Hence, equation (2.2) can be re- written as 



k-l 



(0l 



k=0 



k=0 



1=0 



Equating coefficients for each basis function, we obtain an infinite triangular system of linear 
equations. In order to use this system for estimating /, we define 

TO— 1 

fm{x)= £/ (fc) &(ar), (2.3) 
k=0 

approximation of / based on the first m Laguerre functions. The following Lemma states how the 



coefficients in (2.3) can be recovered. 

Lemma 1. Let f m , g m and q m be m- dimensional vectors with elements f^ k \ g^ and q^ k \ 
k = 0, 1, . . . , m — 1, respectively. Then, for any m, one has q m = G m f m where G m is the lower 
triangular Toeplitz matrix with elements 

( 



G (i,j) = 

Hence, f(x) can be estimated by 



(2a)-V2 ff (0) ) if i=ji 

(2a)- 1 / 2 (g^ - g^~% if j < i, 
0, if j > i. 



(2.4) 



TO— 1 



(2.5) 



fc=0 



where f m = G rr ^q m and q m is an unbiased estimator of the unknown vector of coefficients q m . 
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2.2 Recovering Laguerre coefficients from discrete noisy data 

Unfortunately, unlike some other linear ill-posed problems, data does not come in the form of 
unbiased estimators of the unknown coefficients q^ k \ k = 0, 1, . . . , m — 1. Below, we examine how 



the length of the observation interval T and spacing of observations t{ in equation (1.2) affect the 
system of equations in Lemma [TJ 

Let P : [0, T] —> [0, T] be a function generating observations in ( 1.2 ) such that P is a continuously 
differentiable strictly increasing function 

P(0) = 0, P{T) = T, P(ti) = iT/n, i = l,...,n. (2.6) 



Under conditions (2.6), P is a one-to-one function and, therefore, has an inverse P . 



Choose M large enough that the bias in representation (2.3) of / by /m is very small and form 
an (n x M) matrix §>m with elements $(*> fc ) = 4>k{ti)i * = 1,. ■ ■ ,n, k = 0, . . . , M — 1. Let zm be 
the M-dimensional vector with elements z^> = (y, </)j), j = 0, . . . , M — 1. Then, it follows that 

M-l 

y(U)= z®<h(U) = (*mz M )®, i = l,...,n. 

1=0 

If y and h are n-dimensional vectors with components y(U) and q(U), i = 1, . . . ,n, respectively, 
then the vectors q M and zm of the true and the estimated Laguerre coefficients of q{x) can be 
represented, respectively, as 

q M = (S^Af) -1 *!^ *M = (^l/^A/)' 1 *!/!/. (2.7) 
Let us examine matrix Note that, for any k and /, 

(*^*m) (M) = y> fc (P-^T/n))<M^H^/n)) « nT- 1 F <j> h (p-\x))<k{P-\x))dx 
i=i 

= nT- 1 / ^ k {t)^)i{t)p{t)dt 



where p(t) = P'(t). It follows from the above that matrix <J?^-<l?jv/ should be normalized by a factor 
n v T. Indeed, if points ti are equispaced on the interval (0, T], then, for n and T large enough, 
(^^j^m ) ~ nT^ 1 Im where Jjvf is the M-dimensional identity matrix. Hence, in what follows, we 
are going to operate with matrix Am = Tn _1 ($^f and its inverse 

n M = (Am)- 1 = uT-^&m&m)- 1 . (2.8) 

Let e be the vector with components e(ij), i = l,...,n, and £ M = y/n/T ($' m Q>m)~ 1 &m e - 
Then, the vector f M of the true Laguerre coefficients of the unknown function / satisfies the 
following equation 

ZM = G M fM + °VTj^tiMi eM~N(0,Sl M ). (2.9) 



If points U are equispaced on the interval (0, T] and both n and T are large, then, in ( 2.9 ) , £Im ~ I 



M- 
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2.3 Model selection and oracle risk 



Equation (2.9) implies that one can estimate unknown vector f M by f M = G m Zm- However, 



since the value of M is large, the variance of this estimator 

a 2 T 



n 



where A~ T = (A T )~ l 



-1\T 



for any matrix A, will be too large while the bias in the 



representation (2.3) of / by Jm will be very small. Hence, in order to balance the bias and 



the variance components of the error, one needs to choose the best possible number m of Laguerre 



functions in the representation (2.3) of /, i.e., choose the model size. 



In order to achieve a required balance between the bias and the variance components of the 
error, consider a collection of integer indices Ai n = {1, . . . , M} where M < n may depend on n 
and, for m E M. n , the associated subspaces S m C R M defined by 

tGSm if t = (t (0) ,t (1) ,...,t (m - 1) ,0,0,...,0) T . 

Let us denote by z m , q m , f m , £ m and f m the M-dimensional vectors where the first m elements 
coincide with the elements of m-dimensional vectors z m , q m , f m , £ m and f m respectively, and the 
last (M — m) elements are identical zeros. For each m G M n , evaluate 



(G r , 



G ,} z 



M " m 



and denote 



Q, 



G m ft m G m . 



(2.10) 



For the estimator f m of / given by (2.5) with the vector of coefficients f m , the bias- variance 



decomposition of the mean squared error is of the form 

ML - ft = \\fm- ft + ° 2 Tn- 1 Tr(Q r 



(2.11) 



where the bias term \\f m — f\\ 2 = Y^jLmif ) 2 ^ s decreasing and the variance term a 2 Tn 1 Tr(Q m ) 
is growing with m. The smallest possible risk, the so-called oracle risk, is obtained by minimizing 



the right-hand side of expression (2.11) with respect to m: 

R oracle = min E||/ m - ff = min [||/ m - /|| 2 + a 2 TrT l n Tr(Q m )] . (2.12) 
Hence, the objective is to choose a value of m which delivers an estimator of the unknown function 



f(x) with the risk as close to the oracle risk (2.12) as possible. Since the bias in the right-hand 



side of expression (2.11) is unknown, in order to attain this goal, one can use a penalized version 



of estimator (2.5) as it is described in the next section. 
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2.4 Selection of model size via penalization 

For any vector t 6 R , we define contrast as 

1 2 n {t) = \\tf-2{t,G- h }z M ) (2.13) 

and note that for t 6 S m one has, thanks to the nul coordinates of t and the lower triangular form 
of Gm and G m , 

{t,G M ZM) = (t,G M 2 m ) = (t,f m ). 



Let ||-A||2 = y Tr(A T A) and ||A|| = J A max (A T A) be, respectively, the Frobenius and the spectral 
norm of a matrix A, where Xmax(U) is the largest eigenvalue of U. Denote 



v. 



\\y/Q^\\l = Tr(Qj, P l = \\^Q^f = \ max (Q m ) (2.14) 



where sj Q m is a lower triangular matrix such that (y/ Q m ) T \/Q m = Q m - Assume that f? m grows 
at most polynomially in m, i.e. there exist positive constants a and C p such that 



PL < C p m a . (2.15) 
Choose any constant B > and introduce a penalty 

pen(m) = ^Tn' 1 [(1 + B)v 2 rn + (1 + B" 1 )(2a + 2)p 2 m logm] . (2.16) 



For each m = 1,...,M, construct estimator f m (x) of f{x) of the form (2.5) with coefficients 
f m = (G m ) _1 z m , the augmented version f m satisfies 

fm = arg min 7 ^(t). 

Now, choose m = m where 

rh = arg min |m G M n : 7^(/ m ) + pen(m)| . (2-17) 
The following statement holds. 



Theorem 1. Let condition (2.15) hold for some positive constants a and C p . Then, for any B > 0, 
one has 



^(/™):=IE(ll/a-/|| 2 )< mm 

meMn 



3||/m - /II 2 + 4pen(m) + 16C p V(l + S" 1 ) — 

p mn 



(2.18) 



The proof of this and later statements are given in Section [6} 



Note that the upper bound in Theorem [T] is non-asymptotic and holds for any values of T and 
n and any distribution of points t{, i = 1, . . . , n. 
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In order to evaluate relative precision of the estimator constructed above, we shall compare 
its risk with the oracle risk (2.12). Since < for any value of m, it follows from Theorem [l] 
that, for any value of m, the risk of the estimator f^ lies within a logarithmic factor of the oracle 
risk, i.e., the estimator is optimal within a logarithmic factor of m. In particular, the following 
corollary holds. 

Corollary 1. Under conditions of Theorem^ 

R(frh) < 16[(1 + B) + (1 + S- 1 )(2a + 2) \ogm Q ]R orade + 16C p V(l + B" 1 ) Tm l n -\ (2.19) 



where mo = mo(n, T) is the value of m delivering the minimum in the right-hand side of (2.18). 



3 Asymptotic upper bounds for the risk and optimality of the 
estimator 

3.1 Assumptions 

Corollary [T] is valid for any function g and any distribution of sampling points, hence, it is true in 
the "worst case scenario". In majority of practical situations, however, increases much faster 
with m than and the risk of the estimator can exceed the oracle risk only by a finite 
factor independent of mo and n. In particular, in what follows, we shall show that, under certain 
conditions, for n large enough and T = T n , the ratio between R(ffh) and R or ade is bounded by a 
constant independent of n. 

For this purpose, assume that function g(x), its Laplace transform G(s) and matrix fl m defined 



in (2.8) satisfy the following conditions 



(Al) There exists an integer r > 1 such that 



dP 



t=Q 



0, if j = 0, ...,r-2, , N 

3.1 

B r ± 0, if j = r - 1. 



(A2) g G Li[0, oo) is r times differentiable with g^ £ Li[0,oo). 

(A3) Laplace transform G(s) of g has no zeros with nonnegative real parts except for zeros of the 
form s = oo + ib. 

(A4) There exists no such that for n > no, eigenvalues of matrix fi m are uniformly bounded, i.e. 

< Ai < A min (fi m ) < A max (fi m ) < A 2 < oo (3.2) 
for some absolute constants Ai and A 2 . 
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3.2 Introduction to theory of banded Toeplitz matrices 

The proof of asymptotic optimality of the estimator relies heavily on the theory of banded 
Toeplitz matrices developed in Bottcher and Grudsky (2000, 2005). In this subsection, we review 
some of the facts about Toeplitz matrices which we shall use later. 

Consider a sequence of numbers {bk} ( j? = _ OQ such that ^2kL_ OQ \bk\ < oo. An infinite Toeplitz 
matrix T = T{b) is the matrix with elements Tjj = bi-j, i, j = 0, 1, . . .. 

Let C = {z £ C : \z\ = 1} be the complex unit circle. With each Toeplitz matrix T{b) we can 
associate its symbol 

oo 

b(z) = b kZ h , zeC. (3.3) 

k=— oo 

oo 

Since, B{6) = b(e 10 ) = b k e lk6 ', numbers bu are Fourier coefficients of function B{9) = b{e %e ). 

k=— oo 

There is a very strong link between properties of a Toeplitz matrix T(b) and function b(z). In 
particular, if b(z) ^ for z G C and wind(fc) = Jb, then b(z) allows Wiener- Hopf factorization 
b(z) = b-(z) b + (z) z Jb where b + and 6_ have the following forms 



z k 



k=0 k=0 

(see Theorem 1.8 of Bottcher and Grudsky (2005)). 

If T(b) is a lower triangular Toeplitz matrix, then b(z) = b + (z) with 6^ = b\~. In this case, 
the product of two Toeplitz matrices can be obtained by simply multiplying their symbols and the 
inverse of a Toeplitz matrix can be obtained by taking the reciprocal of function 6 + (z): 

T(b + d + ) = T(b + )T(d + ), T-\b + ) = T{l/b + ). (3.4) 

Let T m (b) = T m (b + ) € R mxm be a banded lower triangular Toeplitz matrix corresponding to 

m— 1 

the Laurent polynomial b(z) = b^z k . 

k=0 

In practice, one usually use only finite, banded, Toeplitz matrices with elements Tij, i,j = 
0, 1 , . . . , m — 1. In this case, only a finite number of coefficients b^ do not vanish and function b(z) 

K 

in (3.3) reduces to a Laurent polynomial b(z) = bkz k , z £ C, where J and K are nonnegative 



integers, b-j ^ and bx 7^ 0. If b(z) ^ for z£C, then b(z) can be represented in a form 

b(z) = z~ J b K Y[(z - fjLj) Y[(z - v k ) with \^\ < 1, \u k \ > 1. (3.5) 
j=i k=i 

In this case, the winding number of b{z) is wind(fr) = Jq — J. 

Let T m {b) = T m (6 + ) G j^mxm ^ e a k anc i ec i lower triangular Toeplitz matrix corresponding 

m— 1 

to the Laurent polynomial b(z) = b k z k . If b has no zeros on the complex unit circle C and 

k=0 
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wind(fc) = 0, then, due to Theorem 3.7 of Bottcher and Grudsky (2005), T(6) is invertible and 
lim sup ||T~ 1 (6)|| < oo. Moreover, by Corollary 3.8, 

lim \\T~\b)\\ = WT-^W (3.6) 

m— yoo 



3.3 Relation between p m and 

In order to apply the theory surveyed above, we first need to examine function b(z) associated 



with the infinite lower triangular Toeplitz matrix G defined by (2.4) and the Laurent polynomial 
associated with its banded version G m . It turns out that b{z) can be expressed via the Laplace 
transform G(s) of the kernel g(t). In particular, the following statement holds. 

Lemma 2. Consider a sequence {frfc}£Lo with elements bo = g^ and b^ = g^ k > — g^" 1 ^, k = 1, 2, . . . 



where g^ are Laguerre coefficients of the kernel g in (1.2). Then, b^, k > 0, are Fourier coefficients 
of the function 

where G(s) is the Laplace transform of the kernel g(x). 

For any function w(z) with an argument on a unit circle C denote 

llitfllcirc = maxw;(z). 
|*|=1 

The following lemma shows that indeed p^logm = o(t>^J asm-> oo. 



Lemma 3. Let b(z) be given by (3.1), i.e., b(z) = G(a(l + z)/(l — z)), \\z\\ = 1. Denote 



w(z) = (1- z)- r b{z), W - l {z) = {l-z) r b- l {z), \\z\\ = l. (3.8) 

Then, under assumptions (Al)-(A^), w(z) and w~ 1 (z) have no zero on the complex unit circle 
and, for m large enough, one has 

^(IhlUrc)" 1 m 2r+1 < v 2 m <2C r \ 2 \\w- 1 \\ ctrc m 2r+ \ (3.9) 
mp 2 m < C(r,w) v 2 m , (3.10) 



where p m and are defined in (2.14), and A2 are given by (3.2) and C(r,w) is an absolute 



constant which depends only on w and r: 

C(r,w) = 2* r+1 [(r-iy.} 2 {\\w\\ circ fluT 1 \\ crrc ) 2 A 2 /A x . 
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3.4 Asymptotic optimality of the estimators 

Note that Lemma [3] implies that, in (2.16), p^logm = o(u^J as m — > 00, so that the second term 
in (2.16) is of smaller asymptotic order than the first term. Consequently, as n — > 00, T/n — > 0, 
the right-hand side of ( |2.18[ ) is of the same asymptotic order as the oracle risk (2.12), so that, 
combination of Theorem [I] and Lemma [3] leads to the following statement. 



Theorem 2. Let condition (2.15) hold for some positive constants a and C p . Then, under 
assumptions (Al)-(A4), for an estimator of f with penalty given by equation (2.16) with B > 0, 
as n — )• 00, 

R(ffh) 



provided T/n — >• as n 



00. 



R, 



oracle 



< 16(1 + B)(1 + o(l)), 



(3.11) 



= argmm m 



/|| 2 + a 2 Tn- 1 vK\. Then, due to bounds (3.9) on v^, one 



Proof Let mo 

has mo — > 00 and 

as m —7- 00 which, in combination with Theorem [TJ completes the proof. 



n 



as T/n — > 0. Hence, it follows from Lemma 3jthat logm = o(^) 



Remark 1. The theory above is valid for T being finite as well as for T = T n — > 00 as long as 
T n /n — > as n — > 00. Indeed, the natural consequence of T being finite is that the bias term 
11/ — /m|| 2 might be relatively large due to mis-representation of / for t > T. However, since both 
the risk of the estimator R(ffh) and the oracle risk Roracie are equally affected, Theorem [2] remains 
valid whether T = T n grows with n or not. 



Remark 2. The right hand side of formula (3.11) is strictly increasing in B, so, the smaller B is, 



the closer the risk to the optimal oracle risk as n — > 00. Note, however, that choosing asymptotically 



small value for B (e.g, B = 1/n) can make the second term in the penalty (2.16) dominant, so that 



(3.11) will become invalid. 



4 Simulation study 

In order to evaluate finite sample performance of the methodology presented above, we carried out 
a simulation study. We chose three versions of the kernel g, normalized to have their maximum 
equal to 1: 

• g\ (t) which coincides with the fit of an arterial input function (AIF) for real data obtained 
in the REMISCAN (2012) study. The real-life observations of an AIF corresponding to 
kernel g\ coming from one patient in the REMISCAN study [29] and fitted estimator of g\ 
using an expansion over the system of the Laguerre functions with M = 18 are presented 
in Figure [2] One can see clearly two behavioral patterns : initial high frequency behavior 



14 



caused by injection of the contrast agent as a bolus and subsequent slow decrease with regular 
fluctuations due to the recirculation of the contrast agent inside the blood system. 



aif = g 




40 60 80 time 100 



Figure 2: Observations of an arterial input function (AIF) corresponding to kernel g coming from 
one patient in the REMISCAN study [29] and fitted estimator of g using an expansion over the 
system of the Laguerre functions with M = 17. 



• 92{t) = t e which aims to reproduce a long injection of contrast agent; 

• 9z{t) = i 7 (100 + i) -1 exp (— 0.9i 3//4 ) which describes an injection with a recirculation of the 
contrast agent inside the blood network. 

Simulations were carried out for five different test functions /: 

• ft(x) = exp(-O.lx), 

• h{x) = exp(-0.6a;), 

• f 3 ( x ) = 0.5 exp(-O.lx) +0.5 exp(-0.6x), 

• fi{x) = 1 — IG(2; 0.5) where IG(2;0.5) is the cdf of the gamma distribution with the shape 
parameter 2 and the scale parameter 0.5, 



+ 1)^/3. 



The value of a in formula (2.1) was chosen so that to provide the best possible fit for the kernel g 



when the number of terms in the expansion of g is maximum, i.e. m = M. 
The functions / and g are shown in Figure [3} 



15 



time 



ao time 100 



Figure 3: Test functions : (left) the kernel functions g - (right) the estimated functions / 



We illustrate performance of our methodology using kernel g\ and test functions f±, f^. 
Figure [4] shows the observations and the true convolution for a medium signal-to- noise ratio 8. The 
associated estimators are presented in Figure [5j Here SNR is defined as 

SNR = ^Var(/)/(a 2 Var(<?)) 

where, for any function (p, we define Var(y>) as 



T 



Var(<£>) = / <p (x)dx 



o 



T 



n 2 



(p(x)dx 







The idea of defining of SNR in this manner is to remove the effect of convolution with g. This 
corresponds to SNR of Abramovich and Silverman (1998). 

For simulations with gi(t) we chose /3 = 1 in ( |1.1[ ). We should mention that the value of (3 is 
usually unknown in real-life situations. However, since in equation ([11]), f(t) = (3(1 - F(t)) where 
F(t) is a cdf, one knows that /(0) = j3 and, therefore, can estimate f3 as ft = /(0). 

We executed simulations with T = 100, M = 11, two values of sample sizes, n = 100 and 
n = 200, and three signal-to-noise ratios (SNR), namely, SNR = 5,8 and 15. The value 5 
corresponds to real-life conditions, smaller values 8 and 15 correspond to noise level attained after 
the first denoising step as described in Rozenholc and ReiB (2012). 

For a given trajectory, the empirical risk was evaluated as 

n r i2 
r(f) = n~ l £ /(*) - f(U)\ (4.1) 
i=i 
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20 40 60 80 100 



20 40 60 80 100 




Figure 4: Observations and true convolutions of kernel g\ with (unknown) functions fi, . . . , f& 



Table 1: The values of empirical risk increased by a factor of 100: 100 x R(f). Empirical risks are 
computed for 400 samples for g\ , gi and gs and five functions of interest fx, . . . , /g . 



100 x R(f) 


n = 100 


n = 200 


SNR> 




/i 


h 


h 


U 


h 


h 


h 


h 


h 


h 




9i 


0.33 


0.43 


0.065 


0.25 


5.1 


0.30 


0.22 


0.052 


0.22 


4.2 


5 


92 


0.27 


0.37 


0.063 


0.25 


5.1 


0.22 


0.15 


0.051 


0.16 


4.0 




93 


0.16 


0.57 


0.085 


0.37 


6.1 


0.14 


0.44 


0.061 


0.36 


5.5 




9i 


0.31 


0.20 


0.061 


0.18 


4.0 


0.23 


0.12 


0.051 


0.070 


3.9 


8 


92 


0.23 


0.18 


0.062 


0.11 


4.0 


0.06 


0.14 


0.050 


0.031 


3.6 




93 


0.15 


0.57 


0.079 


0.37 


5.3 


0.13 


0.41 


0.059 


0.357 


4.8 




9i 


0.162 


0.15 


0.066 


0.075 


4.0 


0.025 


0.085 


0.050 


0.032 


2.6 


15 


92 


0.022 


0.17 


0.061 


0.037 


3.4 


0.012 


0.115 


0.050 


0.027 


2.8 




93 


0.142 


0.43 


0.077 


0.322 


4.9 


0.132 


0.182 


0.058 


0.154 


4.6 



and the average empirical risk, denoted R(f), is obtained by averaging the values of r(/) over 400 
simulation runs. We used B = 1/2 in penalty (2.16) and reduced the constant 4 in the penalty 
to 1.5 since this constant is an upper bound due to a triangular inequality. The value of a in 



(2.16) is chosen using condition (2.15) as follows. Since 21ogp m < logC p + alogm, a is selected 
by regressing 2 log p m onto log m for m = 0, . . . , 6. 
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Figure 5: Estimators (thick plain line) and (unknown) functions (thick dotted line) fi, . . . , f^. 
Other fine dashed lines represent the estimates for m = 0, . . . , 5. The selected value of m is given 
by rh. 



Results of simulations are presented in Table [TJ Table [T] verifies that indeed the methodology 
proposed in the paper works exceptionally well for functions fi, i = 1, . . . , 4, and is still quite precise 
for test function f§ for which Fourier transform does not even exist. The table demonstrates the 
effect of choosing parameter a: for function f% and n = 100, the average empirical risk does not 
decline when SNR grows. This is due to the bias problem arising from the fact that is the sum 
of two exponentials and we fit only one value of a. 



We also carried out a limited comparison of the method suggested above with the technique 
presented in Abramovich, Pensky and Rozenholc (2012). The comparison is performed using just 
one simple example where f(x) = 0.2 exp(— 0.5x) +0.8 exp(— 2x) and g(t) = i 2 (i + l)e _< (see Figure 



|6|). In this example, the value of r in (3.1) is r = 3 and we used n = 200, a = 0.025 and T = 15. 
It is easy to see from Figure [7] that the Laguerre functions based estimator outperforms the kernel 
estimator of Abramovich, Pensky and Rozenholc (2012) and also it does not exhibit boundary 
effects. 

Finally, we compared our method to Singular Value Decomposition (SVD) techniques as 
described in the context of DCE imaging in Ostergaard et al. (1996) and Fieselmann et al. (2011). 
We tried various regularization methods including thresholding and Tikhonov regularization with 
rectangular or trapezoid rules for approximation of the convolution integral and played with the 
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5 10 time 15 5 10 tim e 15 



Figure 6: Functions g (left) and / (right) for comparison between deconvolution methods using 
adaptive kernels and penalized Laguerre functions. 




Figure 7: Comparison between deconvolution using kernel method and using penalized Laguerre 
functions : (left) q = f*g and observations; (right) estimates / : Penalized Laguerre deconvolution 
(thick plain line) - adaptive kernel estimation (thick dashed line) - true function / (dotted line). 

constant of regularization in order to find manually the best possible tuning in each case. In Figure 
[8] we display one of the best reconstructions which we managed to achieve with the SVD approach. 
One can clearly see how this technique fail to adequately recover unknown function / : first, it 
introduces a shift, second, it produces estimators which fails to be a decreasing functions (recall 
that the function of interest in DCE imaging experiments is f(t) = (3(1 — F(t)) where F(t) is a cdf 
and we use f3 = 1 in our simulations). One reason for these shortcoming is that SVD estimates are 
smooth and degenerate at 0. As it is noted in the papers on DCE imaging (see, e.g., Fieselmann et 
al. (2011)), for convolution kernels corresponding to recirculation of the contrast agent (which is a 
common real- life scenario), SVD fails completely and needs some extra tuning in order to obtain 
quite poor results similar to those presented in Figure |8j 
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Figure 8: Comparison between deconvolution using SVD method with Tikhonov regularization and 
penalized Laguerre functions. SNR=8, n = 200, / = / 1; (upper-left) the kernel g(t) = t 3 exp(-i/3); 
(bottom-left) observations; (right) estimates / : Penalized Laguerre deconvolution (thick plain line) 
- SVD for various regularization constants (fine line) and true function / (dotted line). 



5 Discussion 

In the present paper, we study a noisy version of a Laplace convolution equation. Equations of this 
type frequently occur in various kinds of DCE imaging experiments. We propose an estimation 
technique for the solutions of such equation based on expansion of the unknown solution, the kernel 
and the measured right-hand side over a system of the Laguerre functions. The number of the terms 
in the expansion of the estimator is controlled via complexity penalty. The technique leads to an 
estimator with the risk within a logarithmic factor of m of the oracle risk under no assumptions on 
the model and within a constant factor of the oracle risk under mild assumptions. 

The major advantage of the methodology presented above is that it is usable from a practical 
point of view. Indeed, the expansion results in a small system of linear equations with the matrix 
of the system being triangular and Toeplitz. The exact knowledge of the kernel is not required: 
the AIF curve can be fitted using data from DCE-CT experiments as it is shown in Figure [2] This 
distinguishes the present technique with the method of Abramovich, Pensky and Rozenholc (2012) 
(referenced later as APR) which strongly depends on the knowledge of the kernel in general and 



the value of r in (3.1 ), in particular. After that, the method can be applied to any voxel of interest, 
either at the voxel level or using ROI (region of interest) manually drawn by a doctor or obtained 
using any clustering technique. 

The method is computationally very easy and fast (requires solution of a small triangular system 
of linear equations) and produces no boundary effects due to extension at zero and cut-off at T. 
Moreover, application of the technique to discrete data does not require re-fitting the model for 
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each model size separately. On the contrary, the vector of the Laguerre coefficients of the observed 
function is fitted only once, for the largest model size, and then is truncated for models of smaller 
sizes. The complexity of representation of g adjusts to the complexity of representation of / and 
the noise level. Moreover, if g can be represented by a finite expansion over Laguerre functions 
with k terms, the matrix of the system is fe-diagonal. 

The method performs very well in simulations. It is much more precise than the APR technique 
as Figure [7] confirms. In fact, the absence of exhaustive comparisons between the two methods is 
due to the fact that it is very tricky to produce estimators by the APR method, especially, in the 
case of g\ which represents real life AIF. Similarly, as our study and Figure [8] show, the method is 
much more accurate than the SVD-based techniques. 

There are few more advantages which are associated with the use of Laguerre functions basis. 
Since one important goal of future analysis of DCE-CT data is classification of the tissues and 
clustering of curves f(t) = (3(1 — F(t)) which characterize their blood flow properties, representation 
of the curves via Laguerre basis allows to replace the problem of classification of curves by 
classification of relatively low-dimensional vectors. In addition, due to the absence of boundary 
effects, the method allows to estimate classical medical parameters of interest (3 which describes 
the perfusion of blood flow, and also If = J f(s) ds which characterizes the vascular mean transit 
time. These parameters can be estimated by (3 = l//(0) and If = J f(s) ds, respectively. 

The complexity of representation of g is controlled by the choice of parameter a. Parameter 
a is a non-asymptotic constant which does not affect the convergence rates. In practice, one can 
choose a in order to minimize \\g — 3m 1 1 where c/m is a fitted version of g using the first M Laguerre 
functions. Then, the same value of a can be used in representation of the solution /. Our choice of a 
provides a reasonable trade-off between the bias and the variance for majority of kernels considered 
above, including a real life AIF kernel coming from the REMISCAN (2007) study. However, our 
limited experimentation with choices of a shows that there is room for improvement: undeniably, 
fine tuning parameter a can improve estimation precision, especially, in the case when kernel g has 
a strong exponential decay. However, this issue is a matter of future investigation. 
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6 Proofs 



6.1 Proof of Theorem [T] 



Let m, m! £ -M„, t 6 S m i and s 6 5 m . Denote m* = max(m, m') = m V m', r/ m = o^jT jn £ m and 
observe that 



7n(t)-7n(«) = II* "/f - ||* -/f -2(t-a,Gl}rj m .), 



(6.1) 



where / = / M is the vector of the true M first coefficients of function /. Note that, due to 
orthonormality of the Laguerre system, for any to, 



Wfm " /|| 2 = ||/ m " /II 2 + E and " /II 2 = H An " /II 2 + E (/ (i) J • ( 6 - 2 ) 

j=M j=M 

Now, the definition of rh yields that for any to G .M n one has 



which with (6.1), implies 



Jn(frh) + pen(m) < 7„(/ m ) + pen(m), 



|/ m - /|| 2 < ||/ m - /II 2 + Pen(m) + A m , r 



2(/m ~~ fmi^M^m*) ~ pen(m), where to* = m V m. Therefore, using (6.2), we 



Here A m ^ 
obtain that, for any m G M. n i 

Wfm ~ ft < Wfm ~ /II 2 + pen(m) + A m , A 

Note that, due to 2xy < (x 2 /4) + Ay 2 for all x > 0, y > 0, 

A m , A < 2||/ a - /J| sup {t,Gj}fj m *) -pen(m) 

11*11=1 



1 



< Tll/m-/mll + 4 SUp (t, G M fj 

4 *« m * 
11*11=1 



pen(m) 



Now, denote 



G 2 T 

T(m,m') = \{l + B)v 2 m *+2{l + B- 1 ){a + l)\og{m*)p, 

n 



m*J i 



where m* = m V m'. Since, for any to, ||/ ft - / m || 2 < 2\\f m - f\\ 2 + 2||/ m - /|| 2 , then 



A TO)J fj < ^ II /m 



/H 2 +2ll/--/H 2 + 4 



sup {t,G A }rj 



teS„ 



M 'lm* 



r(m, m) 



+ 4r(m, to) — pen(m) 



(6.3) 



(6.4) 



(6.5) 
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Using the fact that 4r(m, m) < pen(m) + pen(m), combining (6.3), (6.4) and (6.5), derive 



II4-/II 2 < ||/ m -/|| 2 +pen(m) + i||4-/|| 2 + ^||/ m -/|| 2 



sup (t,GjJ-fj m *) -T{m,m 

Finally, subtracting [|/^ — f\\ 2 /2 from both sides of the last equation and multiplying both sides 
by 2, obtain 



Wfrn-ff < 3||/ m -/|| 2 + 4pen(m) 



(6.6) 



+ 



sup (t, GJrj m *) 2 - r(m, m) 



Hence, validity of Theorem [T] rests on the following lemma which will be proved later. 



Lemma 4. Let condition (2.15) hold for some positive constants a and C„. Then, for any m and 
any B > 0, one has 



E 



sup (t, G K }rj m *) 2 - r(m, fa) 



t(^S m \yff l ,\\t\\—l 

Proof of Lemma [4] is given in Section 6.3 



< 



2C 2 a 2 T 



mn 



1 + 



B 



6.2 Proofs of Lemmas [2] and |3] 

Proof of Lemma pi To prove this statement, we shall follow the theory of Wiener-Hopf integral 
equations described in Gohberg and Feldman (1974). Denote Fourier transform of a function p{x) 

/oo 
e luJx p(x)dx and observe that 



(a + iuif 



(a — iuj 



ifc+i 



Therefore, elements of the infinite Toeplitz matrix G in (2.4) are generated by the sequence bj, 
j > 0, where 



(2a)- 1 /2( 5 0')_ 5 0'-D ) 



1 

3 doo 



iuj — a 



iuj + a I a 2 + uj 2 



-oo 

, j = 0, 1, ... . 



(6.7) 



Note that | {iuj — a) / (iu + a) \ = 1, so that we can use the following substitution in the integral (6.7): 



ico — a 



ico + a 



UJ 



a(e w + l) asinf 



i(e ie -l) cos6>-l 



, < 9 < 2ir. 
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Simple calculations show that 



so that bj, j E Z, are Fourier coefficients of the function 

Now, let us show that bj = for j < 0. Indeed, if j = —k, k > 0, then 
bj = - I g(u) ( — ) ^ ; <2 = - / 



7r ./-ix, \ioj — a J a 2 + uS 2 it J_ 00 \i(—uj) + aj a? + uj 2 
1 

2^ 



g(a;)[^-(-w) - = / g(x) [<j)k{-x) - 4>k-i{-x)\ dx = 



since = if x < and 4>k{—x) = if x > 0. Hence, function B{9) = b(e te ) has only coefficients 
j > 0, in its Fourier series. Now, to complete the proof, one just needs to note that G(s) = g(is) 
for any s such that Laplace transform G(s) of g exists. 

Proof of Lemma [3]. Let us first find upper and lower bounds on HG^H! = Tr^G^G" 1 ) and 
ll^m 1 !! 2 = ^maxlG^G" 1 ). For this purpose, examine the function 

Denote y = a(z + 1)/(1 — z), so that z = (y — a)/(y + a) and G(y) = b((y — a)/(y + a)). 

Let us show that, under Assumptions (A1)-(A4), b(z) has a zero of order r at 2 = 1 and all 
other zeros of b{z) lie outside the unit circle. 

For this purpose, assume that y = a + i/3 is a zero of G, i.e. G(a + i/3) = 0. Simple calculus 
yields 

y -a 



y + a 



4aa 



(a + a) 2 + /? 2 ' 

so that \z\ = \(y — a)/{y + a)\ < 1 iff a > 0. But, by Assumption (A3), G(y) has no zeros with 
nonnegative real parts, so that a < and \z\ = \(y — a)/(y + a)\ > 1. Therefore, all zeros of 6(2), 
which correspond to finite zeros of G, lie outside the complex unit circle C. 

Assumptions (Al), (A2) and properties of Laplace transform imply that G(s) = s~ r (B r + G r (s)) 
where G r (s) is the Laplace transform of g^ r \t). Hence, 

f 0, if j = 0, ...,r- 1, 

lim s J G(s) = { 
Re [ B r / 0, if j = r, 

so that y = 00 + i/3 is zero of order r of G(y). Since lim (y — a)/(y + a) = 1, b(z) has zero of 

Re y— >oo 

order r at z = 1. 
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Then, b(z) can be written as b(z) = (1 — z) r w(z) where w{z) is defined by formula (3.8) and all 
zeros of w(z) lie outside the complex unit circle. Therefore, w(z) can be written as 

N 

Kz) = C w ]J(z - Q), 0<iV<oo, \Q\>1, (6.8) 

3=1 



W 



where C w is an absolute constant. Since b(z) does not contain any negative powers of z in its 



representation, Jo = and J = in (3.5) and, consequently, wind(u>) = 0. Also, by (3.4) and (3.8), 
one has T _1 (6) = T(6 _1 ) where b' 1 (z) = w^ 1 (z)(l — z)~ r . 

Now, recall that HG^ 1 )! 2 , = H^mC^ -1 ) 111 an< ^ for HG" 1 )) 2 = ||T m (6 -1 )|| 2 . Using relation between 
Frobenius and spectral norms HA1A2H2 < II-A1II2II -A2II f° r anv matrices A\ and A 2 (see, e.g., 
Bottcher and Grudsky (2000), page 116), obtain 



T m (b- L )\\ 2 < ||T m ((l ~ z)- r )\\ 2 \\T m {w 



-In 



IT 



< ||T m ((l-z) 



T m (w~ )\\,(6.9) 



||T m ((l - z)" r )|| 2 < ||r ro (6- 1 )|| 2 ||r m («;)||, ||T m ((l - z)- r )\\ < ||T m (6- 1 )||||T m H||. (6.10) 
Note that (see Bottcher and Grudsky (2005), page 13) 



lim \\T m (w 1 ) 



w 



\circ lim \\T m (w) 

m— >oo 



\W\ 



Also, due to representation (6.8), both w and w 1 are bounded, and, therefore, < \\w 1 \\drc < 00 
and < ||to||cfrc < 00. Denote 



vf(m) = \\T m ((l-z) r )\\ 2 , v s (m) = ||T m ((l - z)~ 



Then, it follows from (3.6), (6.9) and (6.10) that, for m large enough, 



0.5(|MU C )- 2 *>/(m) < ||T m (6- 1 )|| 2 <2||u;- 1 || 2 irc ^(^), 
0.5(\\w\\ circ y 2 v 2 s {m) < IIT^ 1 )!) 2 < 2||uT 1 || 2 irc i/ 2 (m). 



.11) 

.12) 
.13) 



In order to finish the proof, we need to evaluate u 2 (m) and v 2 {rn) and also to derive a relation 
between v^, Pm, Il^m(^ _1 )ll2 an d l|Jm(& _1 )|| 2 - The first task is accomplished by the following 
lemma. 



Lemma 5. Let Vf{m) and v s {m) be defined in (6.11). Then, 

2 -(4r-l) [( r _ 1 ),]-2 m 2r+l < < Q.5m 2 ' ' 



(r!)~ 2 m 2r < v 2 s (m) < m 2r . 
Proof of Lemma [5] is given in Section 6.3 



(6.14) 
(6.15) 



Now, to complete the proof, recall that matrix Q m given by (2.8) is symmetric positive definite, 



so that there exist an orthogonal matrix U m and a diagonal matrix D m . with eigenvalues of ft 
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as its diagonal elements, such that fi m = U m D m Um and fl™ 1 = Uj n D n }U m . Hence, by (2.10), 



(2.14) and Assumption (A4) 

\\T m (b 1 )\\ 2 = T^G^G^) = lh:(G rr ^U' m D m D n ^U m G m T ) 



< WD^WWy^UmG^Wl < \^Tr(G m T n m G m ) = A7/V 



'T„ — \\ y^DmUmG^ {{I < \2\\G m 1 \\l = X 2 \\T m (b 



7 2 
'in 



2 
Pm 



W^D^UmG^f < X 2 \\G m 1 \\ 2 = X 2 \\T m (b- 1 )\\ 2 , 



so that 



Pm < A 2 ||T m (6- 1 )|| 2 , Xi\\T m (b- l )\\ 2 2 < v 2 m < X 2 \\T m (b- 1 ] 



(6.16) 



Combination of (6.12) - (6.16) and Lemma [5] complete the proof. 



6.3 Proofs of supplementary Lemmas 
Proof of Lemma [4l 

The proof of Lemma [4] has two steps. The first one is the application of a x 2 -type deviation 
inequality stated in Laurent and Massart (2000), and improved by Gendre (see Lemma 3.10 of 
Gendre (2009)). The second step consists of integrating this deviation inequality. 

The x 2 - mec i ua hty is formulated as follows. Let A be a p x p matrix A S M p (M) and £ be a 
standard Gaussian vector. Denote v 2 A = Tr(A T A) and p 2 (A) = A max (A T A). Then, for any 
x > 0, 

P (j| AC|| 2 > v\ + 2yJv 2 A p 2 {A)x + p 2 {A)x^j < e~ x . (6.17) 
Now, recall that for t G S m + S m > = S m * where m* = m V m! , one has 

{^i^MVm*) = — ~ , \J Q m * Cm* ) 

where i m * is the m* -dimensional vector formed by the first m* coordinates of t and £ m * is a 
standard m* -dimensional Gaussian vector. Moreover, 



sup (t,GjJ-i] m *) 2 = \\y / Q m *C m *\\ 2 - 



tes m +s m ,, ||t||=i 



Thus, it follows from (6.17) that 



(j^WVOm^CmA? >vl* + ^ Pm*V m * X + Pm*^j < e~* . (6.18) 



For any B > 0, one has 2y/ p^v^x < Bv^* + B 1 p m *x so that 

P (^2 II VQ^Cm* IP > (1 + B)v 2 m , + (1 + B- l )p 2 m ,x) < e 
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Therefore, using definition (6.4) of r(m,m / ), obtain 
E 

\teS m +S m ,,\\t\\=i 

■ - 2 '' 



sup (t, GjJ-fj m *) 2 — r(m, m') ] = E (||\/Q^Cm*l| 2 ~~ T(m,m')\ 
\Us m +S m ,,\\t\\=i / + + 

/•+00 / 2j \ 

J P (j| v/Q^Cm* II 2 " — [(! + + 2 (! + B ~ + !) lo s(™*)/4*] > £j 



Changing variables 



2a 2 T a 2 T 
~ (a + 1)(1 + B" 1 ) log(m*)p^ + £ = (1 + B- X )f^.x 



n 



n 



and application of (6.18) yield 

E ( sup (t, Gj}rj m «) 2 - r(m, m') J < (1 + -B 
\te5 m +s m ,,||t||=i " / , 



-In/5 



«a 2 T 



+oo 



n J2(q+1) log(m*) 
(l + B -l)&^( m *)-2(a+l) 



n 

a 2 T 



< C p {l + B- 1 ) — (m*) 
n 



*\-2 



Recall that m* = m V m and obtain 



E 

and 



sup (t, GJi] m *) 2 - r(m, m) 



tes„ 



< ^ El sup {t,G A }f} mVm ,) 2 -r(m,m') 

+ m'eMn V*e5 m +5 m /,||t||=l 



E ■( 



sup (t, G^fj m *) 2 - r(m, m') < C p (l + S 

teS m +S m ,,||t|| = l /, 



n 



a 2 T X m 



vm'=l 



m'>m 



< C p (l + B-^(rn- x + ^ J)=2C P (1 + S- 1 



<r 2 T 



nm 



which concludes the proof. □ 



Proof of Lemma [5| Note that, by formula 1.110 of Gradshtein and Ryzhik (1980), 

j=o v 7 

so that, by definition of Frobenius norm, 

l|r,„((l- 2 )- r )||| = m 2 + (">-l) 2 (j) +(">-2) 2 ( > 2 

3=0 



+ 



r + m — 2 
m — 1 



iT m ((i-zr)n 2 



max 

1*1=1 



3=0 



m— 1 

E 

3=0 



r + j-1 
r - 1 
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If r = 1, then 



If r > 2, then 



E ( r + j " *) V - j) 2 = f> - j) 2 = m(m+1 f m+1) . 



^ (r + j - l\ _ (r - 1 + 1) . . . (r - 1 + j) x 



(r-l)l^l 3 ) = + 1 



so that, for m > 4, 



*/f(m) = ||T m ((l-z)- r )||2<0.5m 2 '-+ 1 , 

2 ^ 4 m 2r + l 2 -(4r-l) 

^ ^ [(^T)j]2(™--7) ^ [ (r _l)!]2 • 

j=m/4 



which proves validity of (6.14). To show that (6.15) holds, observe that, by formula 0.151.1 of 



Gradshtein and Ryzhik (1980), 

m— 1 



£ 

3=0 



r + j — 1\ / r + m — 1\ m /r + m—1 
r — 1 I \ r / ' r! 
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